Engineering posts about Etl Pipelines
Curated summaries and key learnings for engineers working with Etl Pipelines.
How World Bank Group uses databricks to eradicate poverty through shared knowledge
The World Bank Group has developed a unified data and AI platform on Databricks to integrate structured operational data with unstructured documents, thereby eliminating manual research bottlenecks....
Scaling for MHHS: how Octopus Energy achieved a 50x cost reduction in margin data engineering
The article discusses the significant data engineering challenges faced by Octopus Energy as the UK transitions to a Market-wide Half-Hourly Settlement (MHHS) model, which increases the frequency of...
Unlock seamless and cost-effective marketing campaigns with Lakebase
The article discusses the implementation and benefits of Lakebase, an architecture that combines the advantages of transactional databases with the flexibility of data lakes. It highlights the...
How to Build Real-Time Fraud Detection using Spark Real-Time Mode and Lakebase
This article discusses the implementation of a real-time fraud detection system leveraging Apache Spark's Real-Time Mode (RTM) and Lakebase on the Databricks platform. It highlights the challenges of...
Automate Data & KPI Monitoring with SQL Alerts
The article introduces Databricks SQL Alerts, a tool designed to automate data monitoring and KPI tracking within organizations. It highlights the challenges of manual monitoring processes and...
Announcing the Databricks analytics engineer learning pathway
The Databricks Analytics Engineer Learning Pathway is designed to equip SQL practitioners with the skills necessary to transform raw data into governed, AI-ready semantic models and metrics. The...
Backstage with Lakebase, part 2
In this second part of the series, the article discusses the integration of Backstage with Databricks Lakebase, emphasizing the transformation of database management from a complex, multi-service...
Expanded interoperability with Unity Catalog Open APIs
The article elaborates on the advancements brought by Unity Catalog's Open APIs, which enhance interoperability in data management by allowing enterprises to maintain a single copy of data while...
Clinical operations intelligence belongs on the Lakehouse
The article presents the Site Feasibility Workbench, an open-source application designed to enhance clinical operations intelligence by integrating data, models, and applications within a single...
Data quality is the AI strategy
The article emphasizes the critical role of data quality in leveraging AI effectively within healthcare systems. It highlights NYU Langone Health's strategic approach to data management, where the...
The Convergence of Open Table Formats and Open Catalogs: Catalog Commits is Generally Available
The article announces the General Availability of Catalog Commits, a significant enhancement for Delta Lake and Unity Catalog that aims to unify the lakehouse architecture by addressing coordination...
How CFOs in consulting can recover margin with Databricks
The article outlines the financial challenges faced by consulting firms, particularly in managing data across disparate systems, which leads to inefficiencies and margin pressures. It emphasizes the...
The Rise of Sports Intelligence: How the Lakehouse Turns Tracking Data into Competitive Advantage
The article explores the transformative impact of the Databricks Data Intelligence Platform on professional sports through the integration of vast amounts of tracking and biomechanical data. It...
Migrating Data Ingestion Systems at Meta Scale
The article outlines the comprehensive migration of Meta's data ingestion system, which was essential for maintaining the efficiency and reliability of their social graph data processing. It details...
Growth Analytics Is What Comes After Growth Hacking
The article explores the evolution of growth analytics as a critical component in modern user acquisition strategies. It highlights the shift from tactical growth hacking to a more analytical...
Why telecom churn prediction misses the intervention window
The article explores the challenges faced by telecommunications companies in effectively predicting and intervening in customer churn. Despite the sophistication of churn propensity models,...
Operating room utilization is hiding in your scheduling data
The article highlights the critical importance of operating room (OR) utilization in healthcare systems, emphasizing that underutilized ORs represent significant revenue losses and unmet patient...
Energy trading analytics in a real-time market
The article highlights the challenges faced in energy trading analytics due to the fast-paced nature of price changes and the limitations of traditional batch processing methods. It emphasizes the...
Peril Predicts: Precision Payouts for a Volatile World
The article explores the implementation of parametric insurance, which automates payouts based on predefined conditions triggered by objective event data. It highlights the role of modern catastrophe...
How nOps Rebuilt Their Cloud Optimization Platform on Databricks Lakebase, and Why Other ISVs Should Too
The article outlines how nOps transitioned their cloud optimization platform to utilize Databricks Lakebase, a fully managed PostgreSQL database integrated with the Databricks Lakehouse. This...